Method and system for detecting and/or predicting biological anomalies

ABSTRACT

Biological anomalies are detected and/or predicted by analyzing input biological or physical data using a data processing routine. The data processing routine includes a set of application parameters associated with biological data correlating with the biological anomalies. The data processing routine uses an algorithm to produce a data series, e.g., a PD2i data series. The data series is used to detect or predict the onset of the biological anomalies. To reduce noise in the data series, the slope is set to a predetermined number if it is less than a predetermined value. To further reduce noise, a noise interval within the data series is determined and, if the noise interval is within a predetermined range, the data series is divided by another predetermined number, and new values are produced for the data series.

RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.10/353,849, filed Jan. 29, 2003.

BACKGROUND

The present invention is directed to a method and system for evaluatingbiological or physical data. More particularly, the present invention isdirected to a system and method for evaluating biological or physicaldata for detecting and/or predicting biological anomalies.

The recording of electrophysiological potentials has been available tothe field of medicine since the invention of the string galvanometer.Since the 1930's, electrophysiology has been useful in diagnosingcardiac injury and cerebral epilepsy.

The state-of-the-art in modern medicine shows that analysis of R-Rintervals observed in the electrocardiogram or of spikes seen in theelectroencephalogram can predict future clinical outcomes, such assudden cardiac death or epileptic seizures. Such analyses andpredictions are statistically significant when used to discriminateoutcomes between large groups of patients who either do or do notmanifest the predicted outcome, but known analytic methods are not veryaccurate when used for individual patients. This general failure ofknown analytic measures is attributed to the large numbers of falsepredictions; i.e., the measures have low statistical sensitivity andspecificity in their predictions.

It is usually known that something “pathological” is going on in thebiological system under study, but currently available analytic methodsare not sensitive and specific enough to permit utility in theindividual patient.

The inaccuracy problems prevalent in the art are due to current analyticmeasures (1) being stochastic (i.e., based on random variation in thedata), (2) requiring stationarity (i.e., the system generating the datacannot change during the recording), and (3) being linear (i.e.,insensitive to nonlinearities in the data which are referred to in theart as “chaos”).

Many theoretical descriptions of dimensions are known, such as “D0”(Hausdorff dimension), “D1” (information dimension), and “D2”(correlation dimension).

D2 enables the estimation of the dimension of a system or its number ofdegrees of freedom from an evaluation of a sample of data generated.Several investigators have used D2 on biological data. However, it hasbeen shown that the presumption of data stationarity cannot be met.

Another theoretical description, the Pointwise Scaling Dimension or“D2i”, has been developed that is less sensitive to thenon-stationarities inherent in data from the brain, heart or skeletalmuscle. This is perhaps a more useful estimate of dimension forbiological data than the D2. However, D2i still has considerable errorsof estimation that might be related to data non-stationarities.

A Point Correlation Dimension algorithm (PD2) has been developed that issuperior to both the D2 and D2i in detecting changes in dimension innon-stationary data (i.e., data made by linking subepochs from differentchaotic generators).

An improved PD2 algorithm, labeled the “PD2i” to emphasize itstime-dependency, has been developed. This uses an analytic measure thatis deterministic and based on caused variation in the data. Thealgorithm does not require data stationarity and actually tracksnon-stationary changes in the data. Also, the PD2i is sensitive tochaotic as well as non-chaotic, linear data. The PD2i is based onprevious analytic measures that are, collectively, the algorithms forestimating the correlation dimension, but it is insensitive to datanon-stationarities. Because of this feature, the PD2i can predictclinical outcomes with high sensitivity and specificity that the othermeasures cannot.

The PD2i algorithm is described in detail in U.S. Pat. Nos. 5,709,214and 5,720,294, hereby incorporated by reference. For ease ofunderstanding, a brief description of PD2i and comparison of thismeasure with others are provided below.

The model for the PD2i is C(r,n,ref*,)˜r expD2, where ref* is anacceptable reference point from which to make the various m-dimensionalreference vectors, because these will have a scaling region of maximumlength PL that meets the linearity (LC) and convergence (CC) criteria.Because each ref* begins with a new coordinate in each of them-dimensional reference vectors and because this new coordinate could beof any value, the PD2i's may be independent of each other forstatistical purposes.

The PD2i algorithm limits the range of the small log-r values over whichlinear scaling and convergence are judged by the use of a parametercalled Plot Length. The value of this entry determines for each log-logplot, beginning at the small log-r end, the percentage of points overwhich the linear scaling region is sought.

In non-stationary data, the small log-r values between a fixed referencevector (i-vector) in a subepoch that is, say, a sine wave, whensubtracted from multiple j-vectors in, say, a Lorenz subepoch, will notmake many small vector-difference lengths, especially at the higherembedding dimensions. That is, there will not be abundant small log-rvector-difference lengths relative to those that would be made if thej-vector for the Lorenz subepoch was instead in a sine wave subepoch.When all of the vector-difference lengths from the non-stationary dataare mixed together and rank ordered, only those small log-r valuesbetween subepochs that are stationary with respect to the one containingthe reference vector will contribute to the scaling region, that is, tothe region that will be examined for linearity and convergence. If thereis significant contamination of this small log-r region by othernon-stationary subepochs, then the linearity or convergence criterionwill fail, and that estimate will be rejected from the PD2i mean.

The PD2i algorithm introduced to the art the idea that the smallestinitial part of the linear scaling region should be considered if datanon-stationarities exist (i.e. as they always do in biological data).This is because when the j-vectors lie in a subepoch of data that is thesame species as that the i-vector (reference vector) is in, then andonly then will the smallest log-r vectors be made abundantly, that is,in the limit or as data length becomes large. Thus, to avoidcontamination in the correlation integral by species of data that arenon-stationary with respect to the species the reference vector is in,one skilled in the art must look only at the slopes in the correlationintegral that lie just a short distance beyond the “floppy tail”.

The “floppy tail” is the very smallest log-r range in which linearscaling does not occur due to the lack of points in this part of thecorrelation integral resulting from finite data length. Thus, byrestricting the PD2i scaling to the smallest part of the log-r rangeabove the “floppy tail,” the PD21 algorithm becomes insensitive to datanon-stationarities. Note that the D2i always uses the whole linearscaling region, which always will be contaminated if non-stationaritiesexist in the data.

FIG. 1A shows a plot of log C(r,n,nref*) versus log r. This illustratesa crucial idea behind the PD2i algorithm. It is only the smallestinitial part of the linear scaling region that should be considered ifdata non-stationarities exist. In this case the data were made byconcatenating 1200 point data subepochs from a sine wave, Lorenz data, asine wave, Henon data, a sine wave, and random noise. The referencevector was in the Lorenz subepoch. For the correlation integral wherethe embedding dimension m=1, the segment for the floppy tail (“FT”) isavoided by a linearity criterion of LC=0.30; the linear scaling regionfor the entire interval (D2i) is determined by plot length PL=1.00,convergence criterion CC=0.40 and minimum scaling MS=10 points. Thespecies specific scaling region where the i- and j-vectors are both inthe Lorenz data (PD2i) is set by changing plot length to PL=0.15 orlower. Note that at the higher embedding dimensions (e.g. m=12) afterconvergence of slope vs embedding dimension has occurred, the slope forthe PD2i segment is different from that of D2i. This is because theupper part of the D2i segment (D2i-PD2i) is contaminated bynon-stationary i-j vector differences where the j-vector is in anon-stationary species of data with respect to the species the i-vectoris in.

This short-distance slope estimate for PD2i is perfectly valid, for anylog-log plot of a linear region; it does not matter whether or not oneuses all data points or only the initial segment to determine the slope.Thus, by empirically setting Plot Length to a small interval above the“floppy tail” (the latter of which is avoided by setting the linearitycriterion, LC), non-stationarities can be tracked in the data with onlya small error, an error which is due entirely to finite data length, andnot to contamination by non-stationarities.

Thus, by appropriate adjustments in the algorithm to examine only thatpart of the scaling region just above the “floppy tail”, which isdetermined by, (1) the Linearity Criterion, LC, (2) the Minimum Scalingcriterion, MS, and (3) the Plot Length criterion, PL, one skilled in theart can eliminate the sensitivity of the measure to datanon-stationarities.

This is the “trick” of how to make the j-vectors come from the same dataspecies that the i-vector is in, and this can be proven empirically byplacing a graphics marker on the i- and j-vectors and observing themarkers in the correlation integral. This initial part of the scalingregion is seen mathematically to be uncontaminated only in the limit,but practically speaking it works very well for finite data. This can beproven computationally with concatenated data. When the PD2i is used onconcatenated subepochs of data made by sine-, Lorenz-, Henon-, and othertypes of known linear and nonlinear data-generators, the short scalingsegment will have vector-difference lengths made only by i- and j-vectordifferences that are stationary with respect to each other; that is, theerrors for 1,200-point subepochs are found to be less than 5.0% fromtheir values at the limit, and these errors are due to the finite datalength, not scaling contamination.

FIG. 1B illustrates a comparison of the calculation of the degrees offreedom of a data series by two nonlinear algorithms, the PointCorrelation Dimension (PD2i) and the Pointwise Scaling Dimension (D2i).Both of these algorithms are time-dependent and are more accurate thanthe classical D2 algorithm when used on non-stationary data. Mostphysiological data are nonlinear because of the way the system isorganized (the mechanism is nonlinear). The physiological systems areinherently non-stationary because of uncontrolled neural regulations(e.g., suddenly thinking about something “fearful” while sitting quietlygenerating heartbeat data).

Non-stationary data can be made noise-free by linking separate dataseries generated by mathematical generators having different statisticalproperties. Physical generators will always have some low-level noise.The data shown in FIG. 1B (DATA) were made of sub-epochs of sine (S),Lorenz (L), Henon (H) and random (R) mathematical generators. The dataseries is non-stationary by definition, as each sub-epoch (S, L, H, R)has different stochastic properties, i.e., different standarddeviations, but similar mean values. The PD2i and D2i results calculatedfor the data are seen in the two traces below it and are very different.The D2i algorithm is the closest comparison algorithm to PD2i, but itdoes not restrict the small log-r scaling region in the correlationintegral, as does the PD2i. This scaling restriction is what makes thePD2i work well on non-stationary data.

The PD2i results shown in FIG. 1B, using default parameters, (LC=0.3,CC=0.4, Tau=1, PL=0,15), are for 1,200 data-point sub-epochs. Eachsub-epoch PD2i mean is within 4% of that known value of D2 calculatedfor each data type alone (using long data lengths). The known D2 valuesfor S, L, H, and R data are, respectively, 1.00, 2.06, 1.26, andinfinity. Looking at the D2i values, one sees quite different results(i.e., spurious results). Note that the D2i is the closest algorithm toPD2i, because it too is time-dependent. However, D2i it requires datastationarity, as does the D2 value itself. For stationary data,D2=D2i=PD2i. Only the PD2i tracks the correct number of degrees offreedom for non-stationary data. The single value of D2 calculated forthe same non-stationary data is approximated by the mean of the D2ivalues shown.

For analysis by the PD2i, an electrophysiological signal is amplified(gain of 1,000) and digitized (1,000 Hz). The digitized signal may befurther reduced (e.g. conversion of ECG data to RR interval data) priorto processing. Analysis of RR-interval data has been repeatedly found toenable risk-prediction between large groups of subjects with differentpathological outcomes (e.g. ventricular fibrillation “VF” or ventriculartachycardia “VT”). It has been shown that, using sampled RR data fromhigh risk patients, PD2i could discriminate those that later went intoVF from those that did not.

For RR-interval data made from a digital ECG that is acquired with thebest low-noise preamps and fast 1,000-Hz digitizers, there is still alow-level of noise that can cause problems for nonlinear algorithms. Thealgorithm used to make the RR-intervals can also lead to increasednoise. The most accurate of all RR-interval detectors uses a 3-pointrunning “convexity operator.” For example, 3 points in a running windowthat goes through the entire data can be adjusted to maximize its outputwhen it exactly straddles an R-wave peak; point 1 is on the pre R-wavebaseline, point 2 is atop the R-wave, point 3 is again on the baseline.The location of point 2 in the data stream correctly identifies eachR-wave peak as the window goes through the data. This algorithm willproduce considerably more noise-free RR data than an algorithm whichmeasures the point in time when an R-wave goes above a certain level oris detected when the dV/dt of each R-wave is maximum.

The best algorithmically calculated RR-intervals still will have alow-level of noise that is observed to be approximately +/−5 integers,peak-to-peak. This 10 integer range is out of 1000 integers for anaverage R-wave peak (i.e., 1% noise). With poor electrode preparation,strong ambient electromagnetic fields, the use of moderately noisypreamps, or the use of lower digitizing rates, the low-level noise caneasily increase. For example, at a gain where 1 integer=1 msec (i.e., again of 25% of a full-scale 12-bit digitizer), this best noise level of1% can easily double or triple, if the user is not careful with the dataacquisition. This increase in noise often happens in a busy clinicalsetting, and thus post-acquisition consideration of the noise level mustbe made.

There is thus a need for an improved analytic measure that takes noiseinto consideration.

SUMMARY

The objects, advantages and features of the present invention willbecome more apparent when reference is made to the following descriptiontaken in conjunction with the accompanying drawings.

According to exemplary embodiments, biological anomalies are detectedand/or predicted by analyzing input biological or physical data using adata processing routine. The data processing routine includes a set ofapplication parameters associated with biological data correlating withthe biological anomalies. The data processing routine uses an algorithmto produce a data series, e.g., a PD2i data series, which is used todetect or predict the onset of the biological anomalies.

According to one aspect of the invention, to reduce noise in the dataseries, the slope is set to a predetermined number, e.g., zero, if it isless than a predetermined value, e.g., 0.5.

According to another aspect, a noise interval within the data series isdetermined and, if the noise interval is within a predetermined range,the data series is divided by another predetermined number, e.g., 2, andnew values are produced for the data series.

According to exemplary embodiments, reducing the noise in the dataseries improves detection/prediction of biological anomalies such ascardiac arrhythmias, cerebral epileptic seizure, and myocardialischemia.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates a plot of log C(r,n,nref*) versus log r for theconventional PD2i algorithm;

FIG. 1B illustrates a plot showing calculation of degree of freedom(dimensions) by two time-dependent algorithms when applied to noise-freenon-stationary data;

FIGS. 2A and 2B illustrate performance of PD2i when low-level noise isadded to the non-stationary data;

FIG. 3 illustrates examination of low-level noise in two sets ofRR-intervals made from two digital ECG's;

FIGS. 4A-4F illustrate low level noise (insets) in the RR-intervals ofcontrol patients with acute myocardial infarctions;

FIGS. 4G-4L illustrate low level noise in the RR-intervals of arrhythmicdeath patients;

FIG. 5A illustrates an exemplary flow diagram for the logic of the NCAapplied to ECG data in cardiology;

FIG. 5B illustrates an exemplary flow diagram for the logic of the NCAapplied to EEG data and related concepts in neurophysiology; and

FIG. 6 illustrates exemplary flowcharts for the NCA implemented insoftware according to an exemplary embodiment.

DETAILED DESCRIPTION

According to an exemplary embodiment, a technique has been developed toeliminate the contribution of low-level noise to nonlinear analyticmeasures, such as PD2i.

To see why the noise is important, reference is made to FIGS. 2A and 2B.In FIGS. 2A and 2B, the data are the same S, L, H, and R data asdescribed with reference to FIG. 1B. By definition, this data does nothave any noise, as it is made by mathematical generators.

In FIG. 2A, ∀5 integers of low-level noise have been added to thenon-stationary data series. The mean values of PD2i for each sub-epochhave not changed significantly, although there are a few large valuesout of the 1200 data points in each of the sub-epochs.

Adding noise of ∀14 integers, however, now results in spurious PD2ivalues, the means of which are all approximately the same, as shown inFIG. 2B.

According to exemplary embodiments, the NCA (noise considerationalgorithm) examines the low level noise at high magnification (e.g., yaxis is 40 integers full scale, x-axis is 20 heartbeats full scale) anddetermines whether or not the noise is outside a predetermined range,for example, whether the dynamic range of the noise is greater than ∀5integers. If it is, then the data series is divided by a number thatbrings the noise back within the range of ∀5 integers. In this example,the data series may be divided by 2, as only the low-level bit of the 12bit integer data contains the noise.

Since the linear scaling region of the correlation integral, calculatedat embedding dimensions less than m=12, will have slopes less than 0.5when made from low-level noise (e.g., with a dynamic range of ∀5integers), it is impossible to distinguish between low-level noise andtrue small slope data. Conveniently, since slopes less than 0.5 arerarely encountered in biological data, the algorithmic setting of anyslopes of 0.5 or less (observed in the correlation integral) to zerowill eliminate the detection of these small natural slopes, and it willalso eliminate the contribution of low-level noise to the PD2i values.It is this “algorithmic phenomenon” that explains the empirical data andaccounts for the lack of effect of noise within the interval between −5and 5 when added to noise-free data (FIG. 2A). Noise of slightly largeramplitude, however, will show the noise-effects expected to occur withnonlinear algorithms (e.g., FIG. 2B).

Based on application to physiological data, it is now understood thatthe low-level noise must always be considered and somehow kept within apredetermined range, such as between ∀5 integers, or any other rangethat is a relevant one based on the empirical data. This considerationwill prevent spurious increases of PD2i for low-dimensional data (i.e.,data with few degrees of freedom) as illustrated by FIG. 2B. The proofof the concept lies in its simple explanation (“algorithmicphenomenon”), but perhaps even more convincing are the empirical datathat support the use of an NCA. These data will now be presented.

The upper portion of FIG. 3 shows clinical RR-interval data from apatient who died of arrhythmic death (AD). A small segment of 20heartbeats of the RR-interval data is magnified and shown at the bottomportion of FIG. 3. The linear regression represents the slow variationin the signal in the segment of data, while the up-and-down sawtoothvariations represent the noise. Both ECG's and RR's appeared similar,but low-level observations of the data (20 heartbeats, 40-integery-axis) revealed that one had data variation ranging between ±5 integers(OK) and the other between ±10 integers (too large). thelarger-amplitude segment (“too large”) is not identified and corrected,then the PD2i values would be spuriously larger, as in FIG. 2B. Aconsequence of such spuriously larger PD2i values is that a PD2i-basedtest might make the wrong clinical prediction about the vulnerability ofthe patient's heart to lethal arrhythmogenesis.

According to exemplary embodiments, a larger-amplitude segment like thatshown in FIG. 3 (too large) can be identified and corrected using thenoise consideration algorithm (NCA).

Tables 1-4 show clinical data obtained as part of a study supporting theNCA concept. The goal of the study presented in Tables 1-4 was topredict the occurrence of arrhythmic death (AD) from a PD2i-testperformed on the digital ECG of each patient. In a study of 320 patientsexhibiting chest pain in the Emergency Room who were determined to be athigh cardiac risk with the Harvard Medical School protocol,approximately one out of 3 patients needed application of the NCA toprovide meaningful data. If the NCA had not been developed and applied,then the data obtained from these patients would have been meaninglessin those cases where the low-level noise was too large.

Table 1A shows the contingency table of predictive AD outcomes (i.e.,true positive, true negative, false positive, false negative) and theRelative Risk statistic (Rel) for the data set analyzed with severalnonlinear deterministic algorithms (PD2i, DFA, 1/f-Slope, ApEn). Table1B shows the contingency table of predictive AD outcomes and Rel for thedata set analyzed with the more usual linear stochastic algorithms(SDNN, meanNN, LF/HF, LF(ln)).

Tables 1A and 1B show comparison of HRV algorithms in 320 high-riskpatients (N) presenting chest pain in the Emergency Department andhaving assessed risk of acute-MI>7%. All subjects had ECGs recorded and12-month follow-up completed. The defined arrhythmic death outcomes areexpressed as true or false predictions (T or F) by positive or negativeHRV tests (P or N). Abbreviations in the tables are expressed asfollows: SEN=sensitivity (%); SPE=specificity (%); REL=relative riskstatistic; SUR=surrogate-rejection; OUT=outlier-rejection (>3 SD's);AF=atrial-fib rejection.

Nonlinear Deterministic Algorithms

TABLE 1A PD2i ≦ 1.4 PD2i > 1.4 DFAOUT DFA IN 1/fS ≦ −1.07 1/fS > −1.07ApEn ≦ 1 ApEn > 1 TP = 19 TN = 140 TP = 6 TN = 52 TP = 6 TN = 158 TP = 4TN = 166 FP = 96 FN = 1^(#) FP = 227 FN = 14 FP = 75 FN = 14 FP = 61 FN= 16 SEN = 95** SUR = 65 SEN = 30 SUR = 15 SEN = 30 SUR = 65 SEN = 20SUR = 65 SPE = 59** OUT = 0 SPE = 19 OUT = 6 SPE = 68 OUT = 2 SPE = 73OUT = 8 REL >> 23** N = 320 REL = 0.12 N = 320 REL = 0.91 N = 320 REL =0.80 N = 320

Linear Stochastic Algorithms

TABLE 1B SDNN ≦ 65 SDNN > 65 MNN ≦ 750 MNN > 750 LF/HF ≦ 1.6 LF/HF > 1.6LF(ln) ≦ 5.5 LF(ln) > 5.5 TP = 19 TN = 59 TP = 19 TN = 163 TP = 7 TN =196 TP = 19 TN = 96 FP = 202 FN = 6 FP = 98 FN = 6 FP = 63 FN = 18 FP =163 FN = 6 SEN = 76 AF = 29 SEN = 76 AF = 29 SEN = 28 AF = 29 SEN = 76AF = 29 SPE = 23 OUT = 5 SPE = 62 OUT = 5 SPE = 76 OUT = 7 SPE = 37 OUT= 7 REL = 0.93 N = 320 REL = 4.57* N = 320 REL = 1.19 N = 320 REL = 1.78N = 320**p ≦ 0.001; Binomial Probability Test; with multiple-testalpha-protection (alpha level required is 8-fold smaller); expansion of(P + Q)^(n) × 8-fold protection implies p = 0.00016, which is p ≦ 0.001;also p ≦ 0.001 by Fisher's Exact Test for row vs column# associations in a 2 × 2 contingency table; all others are notsignificant by Binomial Probability Test.*p ≦ 0.001 Fisher's Exact Test only; i.e., not significant by BinomialProbability Test.PD2i = Point Correlation Dimension (positive if minimum PD2i ≦ 1.4dimensions, with a systematic low-dimensional excursion of more than 12PD2i values); cases of randomized-phase surrogate rejections (SUR) wereidentical to the cases of F_(N) ≦ 33%.DFA-OUT = Detrended Fluctuation Analysis (α₁ [short-term] is positive,if outside normal range of 0.85 to 1.15); randomized-sequence surrogaterejections (SUR).1/f S = 1/f Slope (positive if ≦−1.075 for slope of log[microvolts²/Hz]vs log [Hz] integrated over 0.04 Hz to 0.4 Hz)ApEn = Approximate Entropy (positive with cut-point ≦1.0 units, slopedistance).SDNN = Standard deviation of normal beats (positive, if ≦65 msec; forpositive, if ≦50 msec, TP = 17).MNN = Mean of normal RR-intervals (positive, if ≦750 msec).LF/HF = Low frequency power (0.04 to 0.15 Hz)/high frequency power (0.15to 0.4 Hz) (positive, ≦1.6).LF(ln) = Low frequency power (0.04 to 0.15 Hz), normalized by naturallogarithm (positive, ≦5.5).^(#)This single AD patient died at 79 days and may not be a true FN; thedigital ECG was recorded prior to two normal clinical ECGs, followed bya third positive one (i.e., the patient could be classified as an“evolving acute MI” who may have been TN at the time the ECG wasrecorded).

As can be seen from the data presented in Tables 1A and 1B, only thePD2i algorithm had statistically significant Sensitivity, Specificity,and Relative Risk statistics in this Emergency Room cohort.

Table 2 shows the Relative Risk statistic for various sub-groups of thehigh-risk cardiac patients. It is clear from the data presented in Table2 that the PD2i performs best in all of them.

Table 2 shows the relative Risk for algorithmic prediction of arrhythmicdeath in 320 high-risk cardiac patients in the Emergency Room. TABLE 2AMI Non-AMI post-MI non-post-MI PD2i 7.39** >12.17** >4.51* >16.85** DFA0.70 0.44 0.63 0.48 1/f Slope 1.67 0.56 0.87 0.90 ApEn 0.50 1.44 0.000.72 SDNN 0.68 1.75 0.83 1.34 MNN 1.94 >20.82** 3.00 3.61* LF/HF 1.080.66 2.52 0.61 LF(ln) 1.08 >5.13* 0.73 2.09**p ≦ 0.001,*p ≦ 0.05, Fisher Exact Test for row vs column association in 2 × 2contingency table;the > sign means that RR went to infinity because FN = 0; the valueshown used FN = 1.Relative Risk = True Positive/False Negative × [True Negative + FalseNegative/True Positive + False Positive].

Table 3 shows performance of PD2i in the prediction of arrhythmic deathin 320 high-risk cardiac patents in the Emergency Room, with and withoutthe use of the NCA on the RR-interval data. TABLE 3 NCA USED NCA NOTUSED PD2i ≦ 1.4 PD2i > 1.4 PD2i ≦ 1.4 PD2i > 1.4 TP = 19 TN = 140 TP =12 TN = 140 FP = 96 FN = 1^(#) FP = 96 FN = 8 REL >> 23** N = 320 REL =1.8# N = 320**p < 0.001^(#)Not statistically significant

Table 3 illustrates how the use or non-use of the NCA can change thestudy outcome for the Relative Risk statistic. Without the considerationof the noise, the PD2i would not have had such remarkable predictiveperformance, and none of the other algorithms would have worked verywell either.

Table 4 illustrates another PD2i measurement criterion that also workswell in predicting AD. Table 4 shows the percentage of all PD2i valuesbetween 3 and 0 degrees of freedom (dimensions) for 16 arrhythmic deathpatients, each of whom died within 180 days of their ECG recording, andtheir matched controls, each of whom had a documented acute myocardialinfarction, but did no die within 1-year of follow-up. The means of thetwo groups were highly statistically significant (P<0.0000001, t-test).TABLE 4 Arrhythmic Death Matched Controls (within 180 days) (AMI, no AD1-yr) Patient ID 3 < % PD2i > 0 Patient ID 3 < % PD2i > 0 Bn032 95Ct024-n 13  Bn078 90 Gr077-n 7 Bn100 90 Bn126 0 Bn090 90 Bn157 0 Bn11370 Bn138 1 Bn137-n 90 Bn160 0 Bn159 98 Bn167 *3  Bn141-n 97 Ct001 0Bn162 80 B216 0 Bn215-n 55 C002 5 Gr012 98 Bn220 1 Bn226-n 95 Ct005 6Gr064 99 Ct008-n 0 B227 95 Ct022-n 0 Gr076 99 Ct009 1 Gr056-n 65 Gr047 5Gr107 40 C021 0 Gr111 90 Gr090 0 Mean ± SD 83 ± 20** Mean ± SD 2.3 ±3.6***These values were due to excessive ectopic beats that produced somescaling in the 3 to 0 range.**P < 0.000001, t-test; all AD subjects met PD2i < 1.4 LDE and 0 <PD2i > 3.0 criteria; Sensitivity = 100%, Specificity = 100%

As can be seen from Table 4, in the high-risk ER patients who do not die(negative-test) the majority of their PD2i's are above 3 dimensions(degrees of freedom). In those patients who die (positive-test) themajority of their PD2i's are below 3 dimensions. This % PD2i<3 criterioncompletely separated the AD patients from their matched controls who hadacute myocardial infarctions, but who did not die of AD(sensitivity=100%; specificity=100%). These results too are completelydependent upon the use of the NCA to keep the distributions fromoverlapping and the Sensitivity and Specificity at 100%. Those subjectsin which the noise-bit was removed, that is, because the low-level noisein their RR-intervals was too high, are indicated by −n at the end ofthe file name.

PD2i Criteria for Predicting Arrhythmic Death

Each of the above Tables 1-4 was based on the observation of alow-dimensional excursion (LDE) to or below a PD2i of 1.4. That is,PD2i<1.4 was the criterion for prediction of AD. There were no falsenegative (FN) predictions using this criterion. The FN case is anathemato medicine, as the patient is told, “you are OK,” but then he or shegoes home to die of AD within a few days or weeks. False positive casesare expected in great numbers, as the cohort is a high-risk one havingpatients with acute myocardial infarctions, monomorphic ectopic foci,and other high-risk diagnoses. These positive-test patients arecertainly at risk, and should be hospitalized, but they will not die,perhaps because of the drugs or surgical interventions that are appliedin the hospital. In other words, the FP classification is not anathemato medicine. What is significant about the application of the PD2i tothese ER patients is, 1) all AD's occurred in positive-test patients,and 2) 51% of the negative-test patients could be safely discharged fromhospital, as none died within the year of follow-up. All of theseclinical results are meaningful, but are completely dependent upon theuse of the NCA to keep the Sensitivity and Specificity at 100% and theRelative Risk high.

FIGS. 4A-4L illustrate the PD2i<1.4 LED's and the % PD2i<3 criteria,both of which would have been changed significantly had the NCA not beenused in some cases (NCA). Although they are related to one another, theuse of both criteria in NCA examined data is probably the best and mostuniversal way to predict AD among high-risk cardiac patients. Thiscombination keeps statistical Sensitivity and Specificity at 100%, asseen for the AD patients and their acute MI controls (Table 4; FIGS.4A-4L).

FIGS. 4A-4F illustrate low-level noise in the RR intervals of 6 acutemyocardial infarction (acute MI) control patients, and FIGS. 4G-Lillustrate low-level noise in the RR intervals of 6 arrhythmic death(AD) patients.

The long segment in each panel represents all of the RR-intervals in the15 minute ECG. The short segment displays the low-level noise tracesfrom a small 20 beat segment at a higher gain. Thus, in each panel, thenoise is superimposed upon larger dynamic activity. All gains are thesame for all subjects (long RR trace=500 to 1000 integers; short RRtrace=0 to 40 integers).

Those subjects with a noise range judged to be larger than ∀5 integers(1 msec=1 integer) had the noise consideration algorithm (NCA) performedbefore the PD2i was calculated. Thus, for example, the NCA was appliedfor the control subjects represented in FIGS. 4B, 4C, and 4F and for theAD subjects represented in FIGS. 4K and 4L.

The PD2i values corresponding to each RRi are displayed on a scale of 0to 3 dimensions (degrees of freedom). For the AD subjects, asrepresented in FIGS. 4G-4L, there are many PD2i values less than 3.0.Table 4 shows this to be a mean of 83% of PD2i's below 3.0 for allsubjects.

The predictability outcomes for the clinical data would not have beenstatistically significant without considering the noise content of thedata. The NCA actually used in all of the above applicationsinvolved, 1) observing whether or not the dynamic range of the noise wasoutside a 10 integer interval, and then, if it was, 2) reducing theamplitude of the RR's sufficiently to get rid of the excess noise. TheNCA was required in approximately ⅓ of the subjects. Rather thanmultiplying each data point by a value that would just reduce thedynamic range of the noise to under 10-integers, the multiplier was 0.5(i.e., it removed a whole bit of the 12-bit data).

All applications of NCA were done blinded to the data outcome(arrhythmic death was determined only after PD2i analyses with NCA werecompleted). This procedure excludes the possibility for experimenterbias and is a required design for statistical analyses.

According to an exemplary embodiment, the noise consideration algorithmas described above may be implemented in software. Determination of thenoise interval may be made visually, based on data displayed, e.g., on acomputer monitor. The data may be displayed at a fixed magnification,e.g., ∀40 integers full-scale centered around the mean of the segmentdisplayed. If the values are outside the ±5 integer range, the user maydecide to divide the data series by a predetermined value, or thedivision may occur automatically.

FIG. 5A illustrates an exemplary flow diagram for the logic of the NCAapplied to ECG data. According to an exemplary embodiment, ECG from thesubject is collected by a conventional amplifier, digitized, and thengiven as input to a computer for analysis. First, RR and QT intervalsare made from the ECG data; then they are analyzed by the PD2i software(PD2-02.EXE) and QTvsRR-QT software (QT.EXE).

According to exemplary embodiments, the NCA is applied at two points,e.g., as part of the execution of the PD2i and QT vs RR-QT software andafter execution of the PD2i and QT vs RR-QT software. For example, theNCA may be applied during execution of the PD2i and QT vs RR-QT softwareso that the slope of log c(n, r, nref*) vs. log r is set to zero if theslope is < than 0.5 and > than zero. Also, the NCA may be applied afterexecution of the PD2i and QT vs RR-QT software to divide the PD2i dataseries by a predetermined integer if the low-level noise is outside apredetermined interval, e.g., outside the interval between −5 and 5. Ifsuch division occurs, the PD2i calculation is repeated for the divideddata by executing the PD2i and QT vs RR-QT software again.

After execution of the PD2i and QT vs RR-QT software is completed, thePoint Correlation Dimension is then calculated as a function of time anddisplayed. The QT vs RR-QT plot is also made and displayed. GraphicsReports are then made for assessing risk. The digitized ECG may beoffloaded for storage.

The descriptions above relate largely to improving thedetection/prediction of detecting deterministic low-dimensionalexcursions in non-stationary heartbeat intervals made from ECG data as aharbinger of fatal cardiac arrhythmias. The descriptions above alsorelate to improving the detection of dynamics of QT vs RR-QTjointly-plotted heartbeat subintervals, in a previously observedexclusion area, as harbingers of fatal cardiac dynamical arrhythmias. Itwill be appreciated, however, that the invention is also applicable toimproving the detection/predication of other biological anomalies using,e.g., electroencephalographic (EEG) data. For example, the NCA may beapplicable to improve the detection of persistent alterations in thedeterministic dimensional reconstructions made from the non-stationaryEEG data as a measure of altered cognitive state. The NCA may also beapplicable to improve detection of an enlarged variance in thedeterministic dimensional variations in EEG potentials as a harbinger ofearly paroxysmal epileptic activity.

FIG. 5B shows an exemplary implementation of the NCA algorithm for anepilepsy patient or normal subject undergoing neural analysis. EEG datafrom the subject is made by a conventional amplifier, digitized, andthen given as input to a computer for analysis. The PD2i.exe software(PD2-02.exe) is then executed, setting the slope to, e.g., zero asnecessary. Next, if the low-level noise is outside a predeterminedinterval, the PD2i data series is divided by a predetermined integer andthe PD2i calculation is repeated for the divided data by executing thePD2i and QT vs RR-QT software again.

The Point Correlation Dimension is then plotted, and a Graphics Reportis then made for assessing location of epileptic focii and/or alterationof cognitive state.

The NCA may be implemented on, e.g., a microcomputer. Although shown asseparate elements, one or all of the elements shown in FIG. 5A and FIG.5B may be implemented in the CPU.

Although the focus of the description above has been mainly on theassessment of ECG data and EEG data, it will be appreciated that othersimilar applications of the invention are possible. The source of theelectrophysiological signal may be different, and the structure of thegraphics report(s) may be specific to the medical and/or physiologicalobjectives. All analyses may use the PD2i algorithm and the NCA in somesoftware form and may be accompanied by other confirmatory analyses.

FIG. 6 is a flow chart illustrating a process which the NCA may beimplemented as software according to an exemplary embodiment. The flowbegins with collection of the data. From the data, the i- and j-VECTORsare made and subtracted from one another (i-j DIFF). These vectordifference lengths are entered, according the their value (X, 1 to1000), into the MXARAY at the embedding dimension used (m, 1 to 12). Theentry is made as an increment of a counter at each location of theMXARAY.

After completion of the making of the vector difference lengths, thecounter numbers (3,7,9,8,2,6,7,4 . . . ) are then used to make thecorrelation integrals for each embedding dimension; this is done bymaking a cumulative histogram as a function of X, at each m.sub.1, andthen making the log-log plot of their cumulative values (e.g., PLOT logC (n,r) vs log r). The cumulative histogram results in the log-log dataplotted in the correlation integral for each embedding dimension (m).

The correlation integral is then tested for five criteria. First, it isdetermined whether the slope at each m is less than 0.5. If the slope isless than 0.5, it is set to zero. Next, the longest linear scalingregion that is within the linearity criterion (LC) is found. This isaccomplished by examining each correlation integral by the LC to findthe longest segment of the second derivative that falls within thelimits of the set parameter (LC=0.30 means within a + to − deviation of15% of the mean slope); this iterative LC test will find a range abovethe “floppy tail” (i.e., the smallest log-r region that is unstablebecause of finite data length) and run up the correlation integral untilthe LC criterion is exceeded (bold section of top correlation integral).

Next, a determination is made whether the segment is within the plotlength criterion (PL). If so, then the correlation integral scalingregion is reset by the PL criterion; this value is set from the smallestdata point in the correlation integral to its criterion value (e.g.,15%, bracket in second from top correlation integral). The upper andlower limits of this region are observed to see if they have at leastthe number of data points required by the minimum scaling (MS)criterion, e.g., 10. The selected regions of all correlation integrals(m−1 to m=12) are plotted and examined by the CC to see if convergenceoccurs at the higher embedding dimensions (e.g., m=9 to m=12); that is,to see if the selected regions have essentially the same slopes in whichthe standard deviation around the mean is within the limits set by theCC (.i.e., CC=0.40 means that the deviation around the mean is within +to −20% of the mean value). If the CC criterion is passed, then the meanslope and standard deviation are stored to file and, e.g., displayed.

Finally, the low-level noise is examined by the user to test if thedynamic range is outside the −5 to +5 interval. If so, then the noisebit is removed from the data file (i.e., each data point value isdivided by 2), and the modified file is then re-calculated, displayed,and stored.

If failure occurs at any of the early criteria (LC, PL, MS) within theflow, then the program will exit and move the PD2i reference vector tothe next data point and then start all over. If failure occurs at theCC, the mean and standard deviation are saved without exiting, for itmay be the case that later the CC is desired to be changed; i.e., the CCis a filter that determines whether or not the PD2i (i.e., the meanslope of m=9 to m=12) will be plotted in later graphical routines.

While the invention has been described with reference to specificembodiments, modifications and variations of the invention may beconstructed without departing from the scope of the invention. Forexample, although the NCA has been described in its application to aPD2i data series, it should be appreciated that the NCA may also beuseful in reducing noise in other types of algorithms, e.g., D2, D2i, orany other predictive algorithm.

It should be understood that the foregoing description and accompanyingdrawings are by example only. A variety of modifications are envisionedthat do not depart from the scope and spirit of the invention.

The above description is intended by way of example only and is notintended to limit the present invention in any way.

1. A method of detecting or predicting biological anomalies, comprisingthe steps of: analyzing input biological or physical data using a dataprocessing routine including a set of application parameters associatedwith biological data correlating with the biological anomalies toproduce a data series, determining whether a slope of the data series issmaller than a predetermined value; if the slope is less than apredetermined value, setting the slope to a predetermined number; andusing the data series to detect or predict the onset of the biologicalanomalies.
 2. The method of claim 1, wherein the predetermined value isapproximately 0.5.
 3. The method of claim 1, wherein the predeterminednumber is zero.
 4. The method of claim 1, further comprising:determining a noise interval within the data series; and if the noiseinterval is within a predetermined range, dividing the data series byanother predetermined number and repeating the step of analyzing toproduce new values for the data series.
 5. The method of claim 4,wherein the other predetermined number is two.
 6. The method of claim 4,wherein the predetermined range is −x to +x, where x is any number. 7.The method of claim 6, wherein the predetermined range is −5 to +5. 8.The method of claim 1, wherein the input biological or physical dataincludes electrophysiological data.
 9. The method of claim 8, whereinthe input biological or physical data includes ECG data that is analyzedto detect or predict the onset of at least one of cardiac arrhythmiasand cerebral epileptic seizure and/or to measure the severity ofmyocardial ischemia.
 10. An apparatus for detecting or predictingbiological anomalies, the apparatus comprising: means for analyzinginput biological or physical data using a data processing routineincluding a set of application parameters associated with biologicaldata correlating with the biological anomalies to produce a data series;means for determining whether a slope of the data series is smaller thana predetermined value; means for setting the slope to a predeterminednumber if the slope is less than a predetermined value; and means forusing the data series to detect or predict the onset of the biologicalanomalies.
 11. The apparatus of claim 10, wherein the predeterminedvalue is approximately 0.5.
 12. The apparatus of claim 10, wherein thepredetermined number is zero.
 13. The apparatus of claim 10, furthercomprising: means for determining a noise interval within the dataseries; and means for dividing the data series by another predeterminednumber if the noise interval is within a predetermined range andproviding the divided data series to the analyzing means for producingnew values for the data series.
 14. The apparatus of claim 13, whereinthe other predetermined number is two.
 15. The apparatus of claim 13,wherein the predetermined range is −x to +x, where x is any number. 16.The apparatus of claim 15, wherein the predetermined range is −5 to +5.17. The apparatus of claim 10, wherein the input biological or physicaldata includes electrophysiological data.
 18. The apparatus of claim 17,wherein the input biological or physical data includes ECG data that isanalyzed to detect or predict the onset of at least one of cardiacarrhythmias and cerebral epilepsy and/or to measure the severity ofmyocardial ischemia.